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Docket No.: 1454.1218 
IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In re the Application of: 
Josef BAUER et al. 

Serial No. Group Art Unit: 

Confirmation No. 

Filed: (concurrently) Examiner: 

For: SYSTEM FOR DETECTING AND EVALUATING WORD SPEECH SIGNALS 

REPRESENTING A WORD FROM A USER OF A SPEECH RECOGNITION SYSTEM 
(as amended) 

PRELIMINARY AMENDMENT 

Assistant Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

Before examination of the above-identified application, please amend the application as 

follows: 

IN THE TITLE: 

Please DELETE the Title in its entirety and REPLACE with the following new Title. 

- SYSTEM FOR DETECTING AND EVALUATING WORD SPEECH SIGNALS 
REPRESENTING A WORD FROM A USER OF A SPEECH RECOGNITION SYSTEM --. 

IN THE SPECIFICATION: 

Please REPLACE the pending specification with the substitute specification attached 

hereto. 

IN THE CLAIMS: 

Please cancel without prejudice or disclaimer claims 1-13 in the underlying PCT 
application and ADD new claims 14-54 in accordance with the following: 

14. (NEW) A method for detecting and evaluating word speech signals representing a 
word from a user of a speech recognition system, comprising: 

detecting acoustic word speech signals from a user; 
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carrying out a speech recognition operation using a first vocabulary; 

assessing probability of correct speech recognition; 

prompting the user to spell out each word for which the probability of correct 
speech recognition does not reach a first desired probability; 

detecting and evaluating letter signals as input by the user; 

carrying out a word recognition operation, after said detecting and evaluating of 
respective letter signals representing a single letter, using a second vocabulary larger than the 
first vocabulary; 

assessing the probability of correct word recognition; and 

terminating spelling and outputting a word obtained with a second desired 
probability by said assessing the probability of correct word recognition. 



Q 15. (NEW) The method as claimed in claim 14, wherein the word recognition operation 

includes 

*0 assigning a letter recognition probability based on the letter speech signals; and 

63 

J determining a word list of all words in the second vocabulary having a letter 

recognition probability not lower than a highest determined letter recognition probability for any 
p word, minus a first threshold value. 

J8SB5. 

u, 

p 16. (NEW) The method as claimed in claim 15, 

wherein said assessing the probability of correct word recognition comprises 
determining whether the word list contains only a single word, and 

wherein said terminating spelling and outputting the word is performed if only a 
single word is contained in the word list. 



17. (NEW) The method as claimed in claim 16, 
further comprising: 

carrying out speech recognition of the word speech signals using the 
word list with each word assigned a speech recognition probability; and 

determining whether a highest speech recognition probability and a 
second highest speech recognition probability differ from one another by a predetermined 
threshold value; and 

wherein if the predetermined threshold value is exceeded by a difference 
between the highest and second highest speech recognition probabilities, said terminating 
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spelling and outputting the word is performed for the word in the word list with the highest 
speech recognition probability. 



18. (NEW) A method for detecting and evaluating word speech signals representing a 
word from the user of a speech recognition system, comprising: 

detecting acoustic word speech signals from a user; 

carrying out a speech recognition operation; 

assessing probability of correct speech recognition; 

prompting the user to spell out each word for which the probability of correct 
speech recognition does not reach a first desired probability; 

detecting and evaluating letter signals as input by the user; 

I** 

q carrying out a word recognition operation, after said detecting and evaluating of 

O respective letter signals representing a single letter; 
J assessing the probability of correct word recognition; 

m terminating spelling and outputting a word obtained with a second desired 

probability by said assessing the probability of correct word recognition; and 

carrying out speech recognition of the word speech signals using the letter 
signals as detected and evaluated, if the correct word recognition is not obtained with the 
second desired probability. 
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19. (NEW) A method for detecting and evaluating word speech signals representing a 
word from a user of a speech recognition system, comprising: 

detecting acoustic word speech signals from a user; 

carrying put a speech recognition operation to obtain a speech recognition 

probability; 

assessing probability of correct speech recognition; 

prompting the user to spell out each word for which the probability of correct 
speech recognition does not reach a first desired probability; 

detecting and evaluating letter signals as input by the user of at least one letter, 
to obtain a letter recognition probability based on each detected letter signal; 

carrying out a word recognition operation, after said detecting and evaluating of 
respective letter signals representing a single letter, based on a combined recognition probability 
using the letter recognition probability and the speech recognition probability; 

assessing the probability of correct word recognition; and 
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terminating spelling and outputting a word if the word is obtained with a second 
desired probability by said assessing the probability of correct word recognition. 



20. (NEW) The method as claimed in claim 19, further comprising generating a word list 
based on the combined recognition probability. 

21 . (NEW) The method as claimed in claim 20, wherein said terminating spelling and 
outputting the word is based solely on a single interrogation as to whether the combined 
recognition probability is the second desired recognition probability. 

u 22. (NEW) The method as claimed in claim 21, wherein said terminating spelling and 

Q outputting the word includes 

|j outputting an appropriate message to the user; and 

ij terminating said detecting of the acoustic word speech signals. 
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23. (NEW) The method as claimed in claim 22, further comprising, after said detecting 
and evaluating of the letter speech signals respectively representing a letter: 

determining whether the user is continuing to speak; 

continuing said detecting and evaluating and the word recognition operation for 
next speech signals respectively representing a letter, if the user continues to speak; and 

outputting one of the word list and a predetermined number of the words with 
highest probabilities in the word list, if the user does not continue to speak. 

24. (NEW) A device for detecting and evaluating word speech signals representing a 
word from a user of a speech recognition system, comprising: 

speech detection means for detecting acoustic word speech signals from a user; 
initial speech recognition means for carrying out a speech recognition operation 
using a first vocabulary; 

speech assessment means for assessing probability of correct speech 

recognition; 

means for prompting the user to spell out each word for which the probability of 
correct speech recognition does not reach a first desired probability; 

letter detection means for detecting and evaluating letter signals as input by the 

user; 
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word recognition means for carrying out a word recognition operation, after said 
letter detection means has evaluated respective letter signals representing a single letter, using 
a second vocabulary larger than the first vocabulary; 

word assessment means for assessing the probability of correct word 
recognition; and 

termination means for terminating spelling and outputting a word obtained with a 
second desired probability by said word assessment means. 

25. (NEW) The device as claimed in claim 24, wherein said word recognition means 
includes 

y, means for assigning a letter recognition probability based on the letter speech 

O signals; and 

|i means for determining a word list of all words in the second vocabulary having a 

& letter recognition probability not lower than a highest determined letter recognition probability for 
Sj any word, minus a first threshold value. 

L 26. (NEW) The device as claimed in claim 25, 

W wherein said word assessment means comprises means for determining whether 

ff the word list contains only a single word, and 

0 wherein said termination means terminates spelling and outputs the word if only a 

|f * single word is contained in the word list. 



27. (NEW) The device as claimed in claim 26, 
further comprising: 

supplemental speech recognition means for carrying out speech 
recognition of the word speech signals using the word list with each word assigned a speech 
recognition probability; and 

means for determining whether a highest speech recognition probability 
and a second highest speech recognition probability differ from one another by a predetermined 
threshold value; and 

wherein if the predetermined threshold value is exceeded by a difference 
between the highest and second highest speech recognition probabilities, said termination 
means terminates spelling and outputs the word in the word list with the highest speech 
recognition probability. 
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28. (NEW) A device for detecting and evaluating word speech signals representing a 
word from the user of a speech recognition system, comprising: 

speech detection means for detecting acoustic word speech signals from a user; 
initial speech recognition means for carrying out a speech recognition operation; 
speech assessment means for assessing probability of correct speech 

recognition; 

means for prompting the user to spell out each word for which the probability of 
correct speech recognition does not reach a first desired probability; 

letter detection means for detecting and evaluating letter signals as input by the 

user; 

word recognition means for carrying out a word recognition operation, after said 
y letter detection means has evaluated respective letter signals representing a single letter; 
4$ word assessment means for assessing the probability of correct word 

§ recognition; 

termination means for terminating spelling and outputting a word obtained with a 
second desired probability by said word assessment means; and 
y supplemental speech means for carrying out speech recognition of the word 

speech signals using the letter signals as detected and evaluated, if the correct word recognition 
is not obtained with the second desired probability. 
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29. (NEW) A device for detecting and evaluating word speech signals representing a 
word from a user of a speech recognition system, comprising: 

speech detection means for detecting acoustic word speech signals from a user; 

speech recognition means for carrying out a speech recognition operation to 
obtain a speech recognition probability; 

speech assessment means for assessing probability of correct speech 

recognition; 

means for prompting the user to spell out each word for which the probability of 
correct speech recognition does not reach a first desired probability; 

letter detection means for detecting and evaluating letter signals as input by the 
user of at least one letter, to obtain a letter recognition probability based on each detected letter 
signal; 
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word recognition means for carrying out a word recognition operation, after said 
letter detection means has evaluated respective letter signals representing a single letter, based 
on a combined recognition probability using the letter recognition probability and the speech 
recognition probability; 

word assessment means for assessing the probability of correct word 
recognition; and 

termination means for terminating spelling and outputting a word if the word is 
obtained with a second desired probability by said word assessment means. 



30. (NEW) The device as claimed in claim 29, further comprising means for generating 
a word list based on the combined recognition probability. 



J£ 31. (NEW) The device as claimed in claim 30, wherein said termination means bases 

termination of spelling and outputting of the word on a single interrogation as to whether the 
combined recognition probability is the second desired recognition probability. 
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32. (NEW) The device as claimed in claim 31, wherein said termination means 
comprises: 



H means for outputting an appropriate message to the user; and 

Q means for terminating detection of the acoustic word speech signals. 
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33. (NEW) The device as claimed in claim 32, further comprising: 

means for determining whether the user is continuing to speak; 

means for continuing the detection, the evaluation and the word recognition 
operation for next speech signals respectively representing a letter, if the user continues to 
speak; and 

means for outputting one of the word list and a predetermined number of the 
words with highest probabilities in the word list, if the user does not continue to speak. 

34. (NEW) A communication device for detecting and evaluating word speech signals 
representing a word from a user of a speech recognition system, comprising: 

a data bus; 

at least one memory device, coupled to said data bus, to store at least one 
vocabulary and at least one program; 
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a speech recognition processor, coupled to said data bus, to detect acoustic word 
speech signals from a user and to carry out a speech recognition operation using a first 
vocabulary; 

a speech output device, coupled to said data bus, to produce audio signals 
simulating speech; and 

a central processor, coupled to said data bus, to assess probability of correct 
speech recognition, to generate first output signals causing said speech output device to prompt 
the user to spell out each word for which the probability of correct speech recognition does not 
reach a first desired probability, to evaluate letter signals as input by the user, to carry out a 
word recognition operation following evaluation of respective letter signals representing a single 
letter using a second vocabulary larger than the first vocabulary, to assess the probability of 
O correct word recognition, and to terminate the evaluation of respective letter signals and 
Jf generate second output signals causing said speech output device to output a word obtained 

with a second desired probability based upon assessment of the probability of correct word 
it recognition. 
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JL 35. (NEW) The communication device as claimed in claim 34, wherein said central 

y processor also assigns a letter recognition probability based on the letter speech signals and 

jj determines a word list of all words in the second vocabulary having a letter recognition 

p probability not lower than a highest determined letter recognition probability for any word, minus 

W a first threshold value. 

36. (NEW) The communication device as claimed in claim 35, wherein said central 
processor assesses the probability of correct word recognition by determining whether the word 
list contains only a single word, and terminates spelling and causes output of the word if only a 
single word is contained in the word list. 



37. (NEW) The communication device as claimed in claim 36, wherein said central 
processor also carries out speech recognition of the word speech signals using the word list with 
each word assigned a speech recognition probability and determines whether a highest speech 
recognition probability and a second highest speech recognition probability differ from one 
another by a predetermined threshold value, and, if the predetermined threshold value is 
exceeded by a difference between the highest and second highest speech recognition 
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probabilities, said central processor terminates spelling and causes output of the word in the 
word list with the highest speech recognition probability. 



38. (NEW) A communication device for detecting and evaluating word speech signals 
representing a word from a user of a speech recognition system, comprising: 

a data bus; 

at least one memory device, coupled to said data bus, to store at least one 

program; 

a speech recognition processor, coupled to said data bus, to detect acoustic word 
speech signals from a user and to carry out a speech recognition operation; 

a speech output device, coupled to said data bus, to produce audio signals 
Q simulating speech; and 

if a central processor, coupled to said data bus, to assess probability of correct 

yi speech recognition, to generate first output signals causing said speech output device to prompt 
*{■ the user to spell out each word for which the probability of correct speech recognition does not 
yg. reach a first desired probability, to evaluate letter signals as input by the user, to carry out a 
^ word recognition operation following evaluation of respective letter signals representing a single 
y letter, to assess the probability of correct word recognition, to terminate the evaluation of 
O respective letter signals and generate second output signals causing said speech output device 
q to output a word obtained with a second desired probability based upon assessment of the 
PJ probability of correct word recognition, and to carry out speech recognition of the word speech 

signals using the letter signals as detected and evaluated, if the correct word recognition is not 

obtained with the second desired probability. 

39. (NEW) A communication device for detecting and evaluating word speech signals 
representing a word from a user of a speech recognition system, comprising: 

a data bus; 

at least one memory device, coupled to said data bus, to store at least one 

program; 

a speech recognition processor, coupled to said data bus, to detect acoustic word 
speech signals from a user and to carry out a speech recognition operation; 

a speech output device, coupled to said data bus, to produce audio signals 
simulating speech; and 
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a central processor, coupled to said data bus, to assess probability of correct 
speech recognition, to generate first output signals causing said speech output device to prompt 
the user to spell out each word for which the probability of correct speech recognition does not 
reach a first desired probability, to evaluate letter signals as input by the user, to carry out a 
word recognition operation following evaluation of respective letter signals representing a single 
letter based upon assessment of a combined recognition probability using the letter recognition 
probability and the speech recognition probability, to assess the probability of correct word 
recognition, and to terminate the evaluation of respective letter signals and generate second 
output signals causing said speech output device to output a word obtained with a second 
desired probability based on the assessment of the combined recognition probability. 

40. (NEW) The communication device as claimed in claim 39, wherein said central 
processor also generates a word list based on the combined recognition probability. 



W 41 . (NEW) The communication device as claimed in claim 40, wherein said central 

processor terminates spelling and causes output of the word based solely on a single 

* interrogation as to whether the combined recognition probability is the second desired 

S3 ■ 

y. recognition probability. 

h 42. (NEW) The communication device as claimed in claim 41, wherein upon terminating 

PJ- spelling and outputting the word, said central processor also outputs an appropriate message to 
the user and terminates detection of the acoustic word speech signals. 

43. (NEW) The communication device as claimed in claim 42, wherein, after detection 
and evaluation of the letter speech signals respectively representing a letter, said central 
processor also determines whether the user is continuing to speak, and if the user continues to 
speak the next speech signals respectively representing a letter are detected; while if the user 
does not continue to speak, said central processor causes outputting of one of the word list and 
a predetermined number of the words with highest probabilities in the word list. 

44. (NEW) The communication device as claimed in claim 43, wherein said 
communication device is connectable to telephone lines, 

further comprising a switching unit coupled to the telephone lines and said data 

bus. 



10 



45. (NEW) An electronically readable data medium storing at least one computer 
program to control a processor to perform a method for detecting and evaluating word speech 
signals representing a word from a user of a speech recognition system, said method 
comprising: 

detecting acoustic word speech signals from a user; 

carrying out a speech recognition operation using a first vocabulary; 

assessing probability of correct speech recognition; 

prompting the user to spell out each word for which the probability of correct 
speech recognition does not reach a first desired probability; 

detecting and evaluating letter signals as input by the user; 
p carrying out a word recognition operation, after said detecting and evaluating of 

O respective letter signals representing a single letter, using a second vocabulary larger than the 
^ first vocabulary; 

assessing the probability of correct word recognition; and 
terminating spelling and outputting a word obtained with a second desired 
probability by said assessing the probability of correct word recognition. 



s 
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O 46. (NEW) The electronically readable data medium as claimed in claim 45, wherein the 

jpj word recognition operation includes 

assigning a letter recognition probability based on the letter speech signals; and 
determining a word list of all words in the second vocabulary having a letter 
recognition probability not lower than a highest determined letter recognition probability for any 
word, minus a first threshold value. 

47. (NEW) The electronically readable data medium as claimed in claim 46, 
wherein said assessing the probability of correct word recognition comprises 

determining whether the word list contains only a single word, and 

wherein said terminating spelling and outputting the word is performed if only a 
single word is contained in the word list. 

48. (NEW) The electronically readable data medium as claimed in claim 47, 
wherein said method further comprises: 
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carrying out speech recognition of the word speech signals using the 
word list with each word assigned a speech recognition probability; and 

determining whether a highest speech recognition probability and a 
second highest speech recognition probability differ from one another by a predetermined 
threshold value; and 

wherein if the predetermined threshold value is exceeded by a difference 
between the highest and second highest speech recognition probabilities, said terminating 
spelling and outputting the word is performed for the word in the word list with the highest 
speech recognition probability. 

49. (NEW) An electronically readable data medium storing at least one computer 
program to control a processor to perform a method for detecting and evaluating word speech 
signals representing a word from a user of a speech recognition system, said method 
comprising: 

detecting acoustic word speech signals from a user; 

carrying out a speech recognition operation; 

assessing probability of correct speech recognition; 

prompting the user to spell out each word for which the probability of correct 
speech recognition does not reach a first desired probability; 

detecting and evaluating letter signals as input by the user; 

carrying out a word recognition operation, after said detecting and evaluating of 
respective letter signals representing a single letter; 

assessing the probability of correct word recognition; 

terminating spelling and outputting a word obtained with a second desired 
probability by said assessing the probability of correct word recognition; and 

carrying out speech recognition of the word speech signals using the letter 
signals as detected and evaluated, if the correct word recognition is not obtained with the 
second desired probability. 

50. (NEW) An electronically readable data medium storing at least one computer 
program to control a processor to perform a method for detecting and evaluating word speech 
signals representing a word from a user of a speech recognition system, said method 
comprising: 

detecting acoustic word speech signals from a user; 



12 



carrying out a speech recognition operation to obtain a speech recognition 

probability; 

assessing probability of correct speech recognition; 

prompting the user to spell out each word for which the probability of correct 
speech recognition does not reach a first desired probability; 

detecting and evaluating letter signals as input by the user of at least one letter, 
to obtain a letter recognition probability based on each detected letter signal; 

carrying out a word recognition operation, after said detecting and evaluating of 
respective letter signals representing a single letter, based on a combined recognition probability 
using the letter recognition probability and the speech recognition probability; 

assessing the probability of correct word recognition; and 

terminating spelling and outputting a word if the word is obtained with a second 
desired probability by said assessing the probability of correct word recognition. 

51 . (NEW) The electronically readable data medium as claimed in claim 50, wherein 
said method further comprises generating a word list based on the combined recognition 
probability. 

52. (NEW) The electronically readable data medium as claimed in claim 51 , wherein 
said terminating spelling and outputting the word is based solely on a single interrogation as to 
whether the combined recognition probability is the second desired recognition probability. 

53. (NEW) The electronically readable data medium as claimed in claim 52, wherein 
said terminating spelling and outputting the word includes 

outputting an appropriate message to the user; and 
terminating said detecting the acoustic word speech signals. 

54. (NEW) The electronically readable data medium as claimed in claim 53, wherein 
said method further comprises, after said detecting and evaluating of the letter speech signals 
respectively representing a letter: 

determining whether the user is continuing to speak, and if the user continues to 
speak the next speech signals respectively representing a letter are detected; and 

outputting one of the word list and a predetermined number of the words with 
highest probabilities in the word list, if the user is not continuing to speak. 
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IN THE ABSTRACT: 

Please DELETE the Abstract in its entirety and replace with the attached Substitute 
Abstract. 



This Preliminary Amendment is submitted to improve the form of the English translation 
as filed. It is respectfully requested that this Preliminary Amendment be entered in the above- 
referenced application. 

In accordance with the foregoing, claims 1-13 have been canceled and claims 14-54 
have been added. Thus, claims 14-54 are pending and are under consideration. 

A substitute specification is also being filed herewith. The substitute specification is 
accompanied by a marked-up copy of the original specification. 

If there are any questions regarding these matters, such questions can be addressed by 
telephone to the undersigned. Otherwise, an early action on the merits is respectfully solicited. 

If any further fees are required in connection with the filing of this Preliminary 
Amendment, please charge same to our Deposit Account No. 19-3935. 



REMARKS 



Respectfully submitted, 



STAAS & HALSEY LLP 





Richard A. Gollhofer 
Registration No. 31,106 



700 Eleventh Street, NW, Suite 500 
Washington, D.C. 20001 
(202) 434-1500 
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SUBSTITUTE SPECIFICATION 

TITLE OF THE INVENTION 

SYSTEM FOR DETECTING AND EVALUATING WORD SPEECH SIGNALS REPRESENTING 
A WORD FROM A USER OF A SPEECH RECOGNITION SYSTEM 
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CROSS REFERENCE TO RELATED APPLICATIONS 

[0001] This application is based on and hereby claims priority to German Application No. 
19942172.2 filed on September 3, 1999 in, the contents of which are hereby incorporated by 
reference. 

BACKGROUND OF THE INVENTION 

[0002] The invention relates to a method for detecting and evaluating word speech signals 
representing a word from a user of a speech recognition system, having the following steps: 

- detecting the acoustic word speech signal, 

- carrying out a speech recognition operation and assessing the probability of correct speech 
recognition, and 

- if the speech recognition does not reach the desired probability, the user is prompted to spell 
out the word, and 

- detecting and evaluating the letter speech signals spelt out by the user. 

[0003] Such a method is known from "Strategies for name recognition in automatic directory 
assistance systems", Andreas Kellner et al., in IEEE Workshop on interactive Voice Technology 
for Telecommunications Applications (IVTTA), pages 21-26, Turin, Italy, September 1998. A 
speech recognition system for a telephone network that has a word mode and a spelling mode 
is described therein. In the word mode, a word is input coherently by being spoken. In the 
spelling mode, a word is input by spelling out. If a word is not recognized in the word mode with 
a satisfactory recognition probability, a switchover is made into the spelling mode, in which the 
word is input by spelling out. Owing to the change into the spelling mode, the word mode can 
be based on a program that is relatively simple and quick to execute, and it is possible 
nevertheless to achieve a very high recognition rate, since in the case of all words not 
recognized exactly, the exact word is input in the spelling mode. However, the price for this high 
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recognition rate is the user-unfriendly spelling out, which lasts substantially longer than when 
the corresponding word is uttered coherently. 



s 



SUMMARY OF THE INVENTION 

[0004] It is an object of the invention to develop the methods of the prior art in a more user 
friendly fashion. 

[0005] The methods according to the invention are distinguished by the following steps: 

- carrying out a word recognition operation after the respective detection of the letter speech 
signals representing a single letter, and 

- assessing the probability of correct word recognition, and 



|| - if a word is obtained with the desired probability with the aid of the word recognition, the 
spelling process is terminated and the word is output. 



[0006] In the case of the method according to the invention, an attempt is therefore made to 
determine the word to be detected as early as after the spelling out of each letter and, if a word 
D is obtained with the desired detection probability, the further spelling process is terminated and 
p the word is output. As a result, the spelling process, which is bothersome for a user, is reduced 
g to a minimum such that the user-friendliness of the method is substantially enhanced by 
pj comparison with the known method, and yet an optimal recognition rate is achieved. 

[0007] According to a preferred embodiment, a speech recognition operation with the aid of 
which the word speech signals representing a word from a user, which signals are uttered 
coherently, is executed on the basis of a smaller vocabulary than in the case of the word 
recognition operation with the aid of which the letter speech signals representing the individual 
letters are evaluated. As a result, the computational outlay on the speech recognition operation 
can be substantially reduced by comparison with a speech recognition operation that takes 
account of all possible words that may occur. A quick response of the method according to the 
invention is achieved thereby. 

[0008] According to a further preferred embodiment, a renewed speech recognition of the 
word speech signals is executed in the case of the word recognition operation with the aid of 
which the letter speech signals are evaluated, the results obtained by the evaluation of the letter 
speech signals being taken into account in this case. This is performed, for example, by virtue 
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of the fact that the letter speech signals are used to draw up a word list that is used as 
vocabulary during the renewed speech recognition. 

[0009] The method according to the invention is ended upon termination of the spelling 
process, and the user of the speech recognition system is output a message to the effect that 
the spelling process is ended, or the word detected by the word recognition is imparted to him. 
However, it is also possible for only a predetermined dialog to be continued between the user 
and the speech recognition system. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0010] These and other objects and advantages of the present invention will become more 
apparent and more readily appreciated from the following description of the preferred 
embodiments, taken in conjunction with the accompanying drawings of which: 

Fig. 1 is a flowchart of the essential steps of the method according to the invention, 
Fig. 2 is a flowchart of the detection and evaluation of the letter speech signals spelt 
out by the user, 

Fig. 3 is a graph indicating how many letters must be spelt on average in order to 
achieve a predetermined acceptance rate, and 

Fig. 4 is a block diagram of a device for executing the method according to the 
invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

[0011] Reference will now be made in detail to the preferred embodiments of the present 
invention, examples of which are illustrated in the accompanying drawings, wherein like 
reference numerals refer to like elements throughout. 

[0012] The method according to the invention is explained below in more detail with the aid 
of an exemplary embodiment that is a constituent of an automatic directory inquiry service and 
which has an automatic speech recognition system for recognizing all German town names. 

[0013] 22,077 town names are listed in all German telephone books. These 22,077 words 
therefore constitute the total vocabulary that contains all town names to be determined with the 
aid of the speech recognition system. 



[0014] Acoustic word speech signals from a user are detected in a step S1 ( Fig. 1). The 
word speech signals are acoustic signals that reproduce a word in a normal, coherent mode of 
utterance. The words are town names in the present exemplary embodiment. 

[0015] The word speech signals detected are evaluated by a speech recognition operation. 
Such speech recognition operations are known per se. They are used to generate a recognition 
result that comprises a word or a list of words, the probability of a correct speech recognition 
being determined in relation to each word and assigned to the respective word. 

[0016] A check is made in step S3 as to whether the speech recognition operation was able 
to determine a word with the desired speech recognition probability. If this is the case, the word 
tu determined in step S4, which is a town name in the present exemplary embodiment, is output 

O and the method according to the inventjon is ended. 

O 
pi- 

^ [0017] If, by contrast, the result of step S3 is that no word could be determined with the 

# required speech recognition probability, the method sequence goes over to step S5 with the aid 

CP 

3i of which a spelling process is executed in which the word to be determined is input through 

spelling out by the user and then evaluated appropriately. The spelling process is explained in 
y. more detail below. 

H [0018] The word determined in step S5 is output in step S6, and the method is ended. 

O 

[0019] In the exemplary embodiment relating to detecting and evaluating a town name, the 
speech recognition is not executed on the basis of the total vocabulary of 22,077 town names, 
but only on the basis of a medium vocabulary of approximately 1 ,000 to 5,000 town names. 
This vocabulary that is considerably reduced by comparison with the total vocabulary includes 
the town names most frequently asked for. The computer power required to run a computer 
program for speech recognition can be substantially diminished by the reduction in the 
vocabulary. The desired town name can be determined very quickly and with a high recognition 
rate for a majority of the queries because of the reduced vocabulary. 

[0020] Since the speech recognition S2 is connected downstream of the spelling process S5, 
the requirement placed on the probability of a correct speech recognition can be set very high, 
since a town name incorrectly recognized by the speech recognition can be input again into the 
spelling process, and a rejection in step S3 does not result in negative effects on the overall 
result of the method according to the invention. 
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[0021] The requirements placed on the probability of a correct speech recognition are 
expediently set so high that the rate of the words that are erroneously recognized in step S2 and 
are, however, evaluated as correctly recognized in step S3 is smaller than 3% and preferably 
smaller than 1%. 

[0022] The spelling process in accordance with step S5 is illustrated with its individual steps 
in a flowchart in Fig. 2. 

[0023] The user is prompted to fill out the town name in step S7. The user is prompted to 
spell out the letters individually in the present exemplary embodiment. 

[0024] The letter speech signals, which represent an individual letter, are detected and 
recognized in step S8. 



p [0025] A word list is drawn up in step S9 in accordance with the letter signals detected and 
evaluated in step S8. This word list is drawn up on the basis of the total vocabulary of all 
22,077 town names, the individual town names being assigned letter recognition probabilities. 
The letter recognition probability is the probability with which one or more spelt out letters of the 

O word are correctly recognized with the aid of the detected and evaluated letter signals. 

|i| 
rt- 

2 [0026] If, for example, the first letter spelt out by the user is a "B", all the town names that 

O begin with a "B" are assigned high letter recognition probabilities. Furthermore, all the town 

m 

names that start with a "W" are also assigned relatively high letter recognition probabilities, 
since a "B" and a "W" sound very similar in German, and the "W" can therefore be recognized 
with a relatively high probability as a correct letter for the detected and evaluated letter signals 
of the spelt out "B". Town names that begin with another letter are therefore assigned 
substantially lower letter recognition probabilities. However, only the town names whose letter 
recognition probability is not lower than the highest determined letter recognition probability 
minus a predetermined threshold value SW1 are accepted in the word list. The remaining town 
names are not taken into account in the following steps. The method thereby applied for 
determining the word list is based on the Viterbi algorithm. 

[0027] The list therefore includes all town names that begin with a "B" or a "W" in the case of 
the input of a "B" as first letter. 

[0028] A check is made in the next step S10 as to whether the list includes only a single town 
name. If this is the case, the method sequence is transferred in step S11 to the main method in 
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accordance with Fig. 1, where the town name determined is then output in step S6, and the 
spelling process is terminated. 



[0029] If, however, the word list includes a plurality of town names, the method sequence 
branches to step S12, in which case a speech recognition of the originally input word speech 
signals is executed anew, the speech recognition being based on the word list drawn up in step 
S9 as vocabulary. Since the speech recognition is based on the same word speech signals as 
in step S2, the same speech recognition probabilities are determined for the same words. This 
step differs from the speech recognition according to step S2 by virtue of the new vocabulary 
that has been determined from the spelt out letters. The town names newly added by contrast 
with the vocabulary originally used in step S2 are firstly assigned a speech recognition 

%l probability. The speech recognition probability determined by the speech recognition according 

O 

fj to step S12 is not combined with the letter recognition probability determined according to the 
letter recognition of step S8. 

■*y 

fg [0030] However, it is also possible to combine these two recognition probabilities as an 
® alternative. They can be combined with one another by multiplication, for example. 

Q 

y [0031] A check is made in the next step S13 as to whether the highest speech recognition 
O' probability for a town name is higher than the second highest speech recognition probability of a 
P further town name in the word list by a predetermined threshold value SW2. If this is the case, 
^ the method sequence branches to step S14 at which the method sequence is transferred again 
to the main method in which the town name with the highest speech recognition probability is 
output and the spelling process is terminated. If the highest speech recognition probability is 
not higher than the next highest speech recognition probability by the predetermined threshold 
value SW2, the method sequence goes over to step S15 in which the user is output a signal for 
speaking the next letter, and a pause is made for uttering the next letter. The signal is a short 
sound signal, for example. 



[0032] A check is made in step S16 as to whether the user is speaking a further letter. If the 
user speaks a further letter, the method sequence goes over again to step S6 at which the 
further letter is detected and recognized. A loop transversal with the steps S8, S9, S10, S1 1 or 
S12, S13, S14 or S15 and S16 is thereby begun. 
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[0033] As in the first loop transversal, a new word list is drawn up in the case of each further 
loop transversal in step S9. For this purpose, the individual town names are again assigned a 
letter recognition probability. This letter recognition probability is determined on the basis of the 
recognition probabilities with which the individual letters of the town names have been correctly 
recognized by the detected and evaluated letter signals. The letter recognition probability is 
calculated by multiplying all the recognition probabilities of the sequence of spelt out letters of 
the town names for which a corresponding sequence of letter signals has been detected and 
evaluated. This calculation is executed in such a way that the letter recognition probability 
previously determined in step S8 is combined with, that is to say multiplied by, the letter 
recognition probability for the execution of step S8 in the preceding loop transversal. 

[0034] Again, included in the word list are only the town names whose letter recognition 
O probability is not lower than the highest determined letter recognition probability minus a 
J 3 predetermined threshold value SW1 . The remaining town names are not taken into account in 
S the following steps, that is to say a new list is drawn up, individual town names not being taken 
into account by comparison with the previous list, and others being included anew. However, a 
tendency arises in this case in accordance with which the number of the words in the word list is 
reduced with each loop transversal, since the recognition is the more specific the more letter 
signals are detected and evaluated. 



m 



m 



pi [0035] In the case of a plurality of transversals of the loop, the word list is reduced solely as 
determined by the letters input during spelling out, and the interrogation of step S10 is therefore 
based solely on the letter recognition probability. 

[0036] This loop is transversed until, as a result of one of the two interrogations in steps S10 
and S13, a town name has been determined with the required recognition probability. If steps 
S10 and S13 do not lead to termination of the loop, but a termination of the loop ensues in step 
S16, that is to say because it is established that the user is no longer speaking, the method 
sequence goes over to step S17 with which either the word that has the highest recognition 
probability, or a residual list having, for example, the three to ten words on the word list that 
have been assigned the highest recognition probabilities, is output. 

[0037] Illustrated in Fig. 3 is a diagram showing how many letters have to be spelt out on 
average, that is to say how many loop transversals must be executed until a predetermined 
acceptance rate is reached. Illustrated by dashes in the diagram is the result for the method 
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according to the invention, which has two termination criteria at steps S10 and S13. The result 
for a conventional method without such termination criteria is drawn with a continuous line. It 
may be seen from this diagram that, for example, after 7 letters have been spelt out an 
acceptance rate of only just above 40% is achieved with the known spelling methods, whereas 
an acceptance rate of over 80% is already achieved with the method according to the invention. 
Substantially fewer letters need be spelt out with the method according to the invention than is 
the case for conventional spelling methods. This diagram also shows that acceptance rates of 
80% to 100% are already achieved with the spelling out of six letters. 

[0038] Consequently, the method according to the invention can be used to limit the average 
number of letters to be spelt out to five to seven. 

[0039] A successful recognition rate was already achieved for an average number of 4.9 
letters with the aid of the described exemplary embodiment for detecting and evaluating the 
town names of Germany. 

[0040] The present invention is not limited to the automatic detection and recognition of town 
names, but is suitable, in particular, for all vocabularies with a limited number of words. It can, 
however, also be used for unlimited vocabularies. The method is then to be modified in a way 
known per se such that when words that are not yet included in the total vocabulary are being 
input by spelling them out a routine is executed with the aid of which these words are added to 
the total vocabulary. 

[0041] The method according to the invention can also be modified in such a way that the 
individual words are assigned a combined recognition probability on the basis of the letter and 
speech recognition. In this case, the word list is drawn up on the basis of the combined 
recognition probability. As a consequence of this, the two recognition probabilities of the above 
described exemplary embodiment, on which the termination criteria according to steps S10 and 
S13 are based, are identical, for which reason one termination criterion can be deleted. 

[0042] According to a simplified method, it is also possible at each loop traversal only to 
remove words from a word list once drawn up. Since no new words can be added thereby, the 
speech recognition of step S12 need be executed only the once during the first loop traversal, 
since town names set forth in the list have all been evaluated already with the corresponding 
speech recognition probability. 
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[0043] In the above exemplary embodiment, the letters are uttered in isolation when being 
spelt out, and are respectively recognized individually. However, it is also possible for the letters 
to be uttered continuously when being spelt out. In the case of such a continuous method, step 
S15 (signal for the next letter and pause) can be eliminated. 

[0044] Figure 4 shows a device, specifically a telephone communication system 1 for an 
automatic directory inquiry service. The telephone communication system 1 is designed as a 
digitally operating telephone communication system having an internal databus 2, a central 
processor unit 3, a memory unit 4, a speech recognition unit 5 and a speech output unit 6. 
Connected to the databus 2 is a switching unit 7 via which the analog and digital telephone lines 
8, 9 can be switched to connect to the units 3 to 6 via the databus 2. Telephone terminals 10 
are connected to the telephone lines 8, 9, it being possible for one or more exchanges 11 to be 
connected there between. 

[0045] A user can select the telephone communication system 1 at one of the telephone 
terminals 10, by which means he is connected to the units 3 to 6 via the switching unit 7 and the 
databus 2. 

[0046] A plurality of computer programs are stored in the memory unit 4. One of these is 
provided for executing the above described exemplary embodiment for recognizing town 
names. If, in his dialog with the telephone communication system, the user reaches a point at 
which the town name needs to be input, an appropriate prompt is output to the user by the 
speech output unit 6, and the user then speaks a town name. The latter is then detected with 
the aid of the speech recognition unit 5 and evaluated in accordance with the above described 
method, the user being prompted, if required, to spell out the town name. 

[0047] Since the method according to the invention is not limited to the recognition of town 
names, a plurality of programs that operate in accordance with the method according to the 
invention can be provided, these programs each being capable of recognizing words of specific 
vocabularies, for example personal names and company names, numbers, stocks or the like. 
These computer programs designed according to the invention are called up and controlled by a 
higher level dialog control program. 

[0048] These computer programs can also be stored on an electronically readable data 
medium and, for example, can be transmitted to another telephone communication system. 



[0049] The invention has been described in detail with particular reference to preferred 
embodiments thereof and examples, but it will be understood that variations and modifications 
can be effected within the spirit and scope of the invention. 
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ABSTRACT 

METHOD FOR DETECTING AND EVALUATING WORD SPEECH SIGNALS REPRESENTING 
A WORD FROM A USER OF A SPEECH RECOGNITION SYSTEM 

In the event of a possibly incorrect speech recognition, the user is prompted to spell out 
the corresponding word. After each spelt-out word, word recognition is executed such that the 
spelling process can be terminated given a satisfactory recognition probability. 
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MARKED-UP COPY OF SUBSTITUTE SPECIFICATION 

[Description] TITLE OF THE INVENTION 

[METHOD] SYSTEM FOR DETECTING AND EVALUATING WORD SPEECH SIGNALS 
REPRESENTING A WORD FROM A USER OF A SPEECH RECOGNITION SYSTEM 

CROSS REFERENCE TO RELATED APPLICATIONS 

[0001] This application is based on and hereby claims priority to German Application No. 
19942172.2 filed on September 3, 1999 in, the contents of which are hereby incorporated by 
reference. 

BACKGROUND OF THE INVENTION 

[0002] The invention relates to a method for detecting and evaluating word speech signals 
representing a word from a user of a speech recognition system, having the following steps: 

- detecting the acoustic word speech signal, 

- carrying out a speech recognition operation and assessing the probability of correct speech 
recognition, and 

- if the speech recognition does not reach the desired probability, the user is prompted to spell 
out the word, and 

- detecting and evaluating the letter speech signals spelt out by the user. 

[0003] Such a method is known from "Strategies for name recognition in automatic directory 
assistance systems", Andreas Kellner et al. f in IEEE Workshop on interactive Voice Technology 
for Telecommunications Applications (IVTTA), pages 21-26, Turin, Italy, September 1998. A 
speech recognition system for a telephone network that has a word mode and a spelling mode 
is described therein. In the word mode, a word is input coherently by being spoken. In the 
spelling mode, a word is input by spelling out. If a word is not recognized in the word mode with 
a satisfactory recognition probability, a switchover is made into the spelling mode, in which the 
word is input by spelling out. Owing to the change into the spelling mode, the word mode can 
be based on a program that is relatively simple and quick to execute, and it is possible 
nevertheless to achieve a very high recognition rate, since in the case of all words not 
recognized exactly, the exact word is input in the spelling mode. However, the price for this high 
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recognition rate is the user-unfriendly spelling out, which lasts substantially longer than when 
the corresponding word is uttered coherently. 



SUMMARY OF THE INVENTION 

[0004] It is [therefore the] an object of the invention to develop the [method mentioned at the 
beginning] methods of the prior art in a more user friendly fashion. [The object is achieved by a 
method having the features of claim 1 . Advantageous refinements of the invention are specified 
in the subclaims.] 

[0005] The [method] methods according to the invention [is] are distinguished by the 
following steps: 

- carrying out a word recognition operation after the respective detection of the letter speech 
signals representing a single letter, and 

- assessing the probability of correct word recognition, and 

- if a word is obtained with the desired probability with the aid of the word recognition, the 
spelling process is terminated and the word is output. 

[0006] In the case of the [method] methods according to the invention, an attempt is 
therefore made to determine the word to be detected as early as after the spelling out of each 
letter and, if a word is obtained with the desired detection probability, the further spelling 
process is terminated and the word is output. As a result, the spelling process, which is 
bothersome for a user, is reduced to a minimum such that the user-friendliness of the method is 
substantially enhanced by comparison with the known method, and yet an optimal recognition 
rate is achieved. 

[0007] According to a preferred embodiment, a speech recognition operation with the aid of 
which the word speech signals representing a word from a user, which signals are uttered 
coherently, is executed on the basis of a smaller vocabulary than in the case of the word 
recognition operation with the aid of which the letter speech signals representing the individual 
letters are evaluated. As a result, the computational outlay on the speech recognition operation 
can be substantially reduced by comparison with a speech recognition operation that takes 
account of all possible words that may occur. A quick response of the method according to the 
invention is achieved thereby. 



[0008] According to a further preferred embodiment, a renewed speech recognition of the 
word speech signals is executed in the case of the word recognition operation with the aid of 
which the letter speech signals are evaluated, the results obtained by the evaluation of the letter 
speech signals being taken into account in this case. This is performed, for example, by virtue 
of the fact that the letter speech signals are used to draw up a word list that is used as 
vocabulary during the renewed speech recognition. 

[0009] The method according to the invention is ended upon termination of the spelling 
process, and the user of the speech recognition system is output a message to the effect that 
the spelling process is ended, or the word detected by the word recognition is imparted to him. 
However, it is also possible for only a predetermined dialog to be continued between the user 
and the speech recognition system. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0010] [The] These and other objects and advantages of the present invention [is explained 
below in more detail with the aid of an exemplary embodiment illustrated in the drawing. In] will 
become more apparent and more readily appreciated from the following description of the 
preferred embodiments, taken in conjunction with the accompanying drawings of which : 

Fig. 1 [shows] is a flowchart of the essential steps of the method according to the 
invention, [in a flowchart] 

Fig. 2 [shows] is a flowchart of the detection and evaluation of the letter speech signals 
spelt out by the user, [in a flowchart,] 

Fig. 3 [shows a diagram] is a graph indicating how many letters must be spelt on 
average in order to achieve a predetermined acceptance rate, and 

Fig. 4 [shows] is a block diagram of a device for executing the method according to the 
invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

[0011] Reference will now be made in detail to the preferred embodiments of the present 
invention, examples of which are illustrated in the accompanying drawings, wherein like 
reference numerals refer to like elements throughout. 

[0012] The method according to the invention is explained below in more detail with the aid 
of an exemplary embodiment that is a constituent of an automatic directory inquiry service and 
which has an automatic speech recognition system for recognizing all German town names. 



[0013] 22,077 town names are listed in all German telephone books. These 22,077 words 
therefore constitute the total vocabulary that contains all town names to be determined with the 
aid of the speech recognition system. 

[0014] Acoustic word speech signals from a user are detected in a step S1 ([figure] Hg, 1). 
The word speech signals are acoustic signals that reproduce a word in a normal, coherent 
mode of utterance. The words are town names in the present exemplary embodiment. 

[0015] The word speech signals detected are evaluated by [means of] a speech recognition 
operation. Such speech recognition operations are known per se. They are used to generate a 
recognition result that comprises a word or a list of words, the probability of a correct speech 
recognition being determined in relation to each word and assigned to the respective word. 

[0016] A check is made in step S3 as to whether the speech recognition operation was able 
to determine a word with the desired speech recognition probability. If this is the case, the word 
determined in step S4, which is a town name in the present exemplary embodiment, is output 
and the method according to the invention is ended. 

[0017] If, by contrast, the result of step S3 is that no word could be determined with the 
required speech recognition probability, the method sequence goes over to step S5 with the aid 
of which a spelling process is executed in which the word to be determined is input through 
spelling out by the user and then evaluated appropriately. The spelling process is explained in 
more detail below. 

[0018] The word determined in step S5 is output in step S6, and the method is ended. 

[0019] In the exemplary embodiment relating to detecting and evaluating a town name, the 
speech recognition is not executed on the basis of the total vocabulary of 22,077 town names, 
but only on the basis of a medium vocabulary of approximately 1 ,000 to 5,000 town names. 
This vocabulary that is considerably reduced by comparison with the total vocabulary includes 
the town names most frequently asked for. The computer power required to run a computer 
program for speech recognition can be substantially diminished by the reduction in the 
vocabulary. The desired town name can be determined very quickly and with a high recognition 
rate for a majority of the queries because of the reduced vocabulary. 

[0020] Since the speech recognition S2 is connected downstream of the spelling process S5, 
the requirement placed on the probability of a correct speech recognition can be set very high, 



since a town name incorrectly recognized by the speech recognition can be input again into the 
spelling process, and a rejection in step S3 does not result in negative effects on the overall 
result of the method according to the invention. 

[0021] The requirements placed on the probability of a correct speech recognition are 
expediently set so high that the rate of the words that are erroneously recognized in step S2 and 
are, however, evaluated as correctly recognized in step S3 is smaller than 3% and preferably 
smaller than 1%. 

[0022] The spelling process in accordance with step S5 is illustrated with its individual steps 
in a flowchart in [figure] Fig. 2. 

[0023] The user is prompted to fill out the town name in step S7. The user is prompted to 
spell out the letters individually in the present exemplary embodiment. 

[0024] The letter speech signals, which represent an individual letter, are detected and 
recognized in step S8. 

[0025] A word list is drawn up in step S9 in accordance with the letter signals detected and 
evaluated in step S8. This word list is drawn up on the basis of the total vocabulary of all 
22,077 town names, the individual town names being assigned letter recognition probabilities. 
The letter recognition probability is the probability with which one or more spelt out letters of the 
word are correctly recognized with the aid of the detected and evaluated letter signals. 

[0026] If, for example, the first letter spelt out by the user is a "B", all the town names that 
begin with a "B" are assigned high letter recognition probabilities. Furthermore, all the town 
names that start with a "W" are also assigned relatively high letter recognition probabilities, 
since a "B" and a "W" sound very similar in German, and the "W" can therefore be recognized 
with a relatively high probability as a correct letter for the detected and evaluated letter signals 
of the spelt out "B". Town names that begin with another letter are therefore assigned 
substantially lower letter recognition probabilities. However, only the town names whose letter 
recognition probability is not lower than the highest determined letter recognition probability 
minus a predetermined threshold value SW1 are accepted in the word list. The remaining town 
names are not taken into account in the following steps. The method thereby applied for 
determining the word list is based on the Viterbi algorithm. 



[0027] The list therefore includes all town names that begin with a "B" or a "W" in the case of 
the input of a "B" as first letter. 

[0028] A check is made in the next step S10 as to whether the list includes only a single town 
name. If this is the case, the method sequence is transferred in step S1 1 to the main method in 
accordance with [figure] Fk^ 1 , where the town name determined is then output in step S6, and 
the spelling process is terminated. 

[0029] If, however, the word list includes a plurality of town names, the method sequence 
branches to step S12, in which case a speech recognition of the originally input word speech 
signals is executed anew, the speech recognition being based on the word list drawn up in step 
S9 as vocabulary. Since the speech recognition is based on the same word speech signals as 
in step S2, the same speech recognition probabilities are determined for the same words. This 
step differs from the speech recognition according to step S2 by virtue of the new vocabulary 
that has been determined from the spelt out letters. The town names newly added by contrast 
with the vocabulary originally used in step S2 are firstly assigned a speech recognition 
probability. The speech recognition probability determined by the speech recognition according 
to step S12 is not combined with the letter recognition probability determined according to the 
letter recognition of step S8. 

[0030] However, it is also possible to combine these two recognition probabilities as an 
alternative. They can be combined with one another by [means of] multiplication, for example. 

[0031] A check is made in the next step S13 as to whether the highest speech recognition 
probability for a town name is higher than the second highest speech recognition probability of a 
further town name in the word list by a predetermined threshold value SW2. If this is the case, 
the method sequence branches to step S14 at which the method sequence is transferred again 
to the main method in which the town name with the highest speech recognition probability is 
output and the spelling process is terminated. If the highest speech recognition probability is 
not higher than the next highest speech recognition probability by the predetermined threshold 
value SW2, the method sequence goes over to step S15 in which the user is output a signal for 
speaking the next letter, and a pause is made for uttering the next letter. The signal is a short 
sound signal, for example. 



[0032] A check is made in step S16 as to whether the user is speaking a further letter. If the 
user speaks a further letter, the method sequence goes over again to step S6 at which the 
further letter is detected and recognized. A loop transversal with the steps S8, S9, S10, S11 or 
S12, S13, S14 or S15 and S16 is thereby begun. 

[0033] As in the first loop transversal, a new word list is drawn up in the case of each further 
loop transversal in step S9. For this purpose, the individual town names are again assigned a 
letter recognition probability. This letter recognition probability is determined on the basis of the 
recognition probabilities with which the individual letters of the town names have been correctly 
recognized by the detected and evaluated letter signals. The letter recognition probability is 
calculated by multiplying all the recognition probabilities of the sequence of spelt out letters of 
the town names for which a corresponding sequence of letter signals has been detected and 
evaluated. This calculation is executed in such a way that the letter recognition probability 
previously determined in step S8 is combined with, that is to say multiplied by, the letter 
recognition probability for the execution of step S8 in the preceding loop transversal. 

[0034] Again, included in the word list are only the town names whose letter recognition 
probability is not lower than the highest determined letter recognition probability minus a 
predetermined threshold value SW1 . The remaining town names are not taken into account in 
the following steps, that is to say a new list is drawn up, individual town names not being taken 
into account by comparison with the previous list, and others being included anew. However, a 
tendency arises in this case in accordance with which the number of the words in the word list is 
reduced with each loop transversal, since the recognition is the more specific the more letter 
signals are detected and evaluated. 

[0035] In the case of a plurality of transversals of the loop, the word list is reduced solely as 
determined by the letters input during spelling out, and the interrogation of step S10 is therefore 
based solely on the letter recognition probability. 

[0036] This loop is transversed until, as a result of one of the two interrogations in steps S1 0 
and S13, a town name has been determined with the required recognition probability. If steps 
S10 and S13 do not lead to termination of the loop, but a termination of the loop ensues in step 
S16, that is to say because it is established that the user is no longer speaking, the method 
sequence goes over to step S17 with which either the word that has the highest recognition 



probability, or a residual list having, for example, the three to ten words on the word list that 
have been assigned the highest recognition probabilities, is output. 

[0037] Illustrated in [figure] Fic^ 3 is a diagram showing how many letters have to be spelt out 
on average, that is to say how many loop transversals must be executed until a predetermined 
acceptance rate is reached. Illustrated by dashes in the diagram is the result for the method 
according to the invention, which has two termination criteria at steps S10 and S13. The result 
for a conventional method without such termination criteria is drawn with a continuous line. It 
may be seen from this diagram that, for example, after 7 letters have been spelt out an 
acceptance rate of only just above 40% is achieved with the known spelling methods, whereas 
an acceptance rate of over 80% is already achieved with the method according to the invention. 
Substantially fewer letters need be spelt out with the method according to the invention than is 
the case for conventional spelling methods. This diagram also shows that acceptance rates of 
80% to 100% are already achieved with the spelling out of six letters. 

[0038] Consequently, the method according to the invention can be used to limit the average 
number of letters to be spelt out to five to seven. 

[0039] A successful recognition rate was already achieved for an average number of 4.9 
letters with the aid of the described exemplary embodiment for detecting and evaluating the 
town names of Germany. 

[0040] The present invention is not limited to the automatic detection and recognition of town 
names, but is suitable, in particular, for all vocabularies with a limited number of words. It can, 
however, also be used for unlimited vocabularies. The method is then to be modified in a way 
known per se such that when words that are not yet included in the total vocabulary are being 
input by spelling them out a routine is executed with the aid of which these words are added to 
the total vocabulary. 

[0041] The method according to the invention can also be modified in such a way that the 
individual words are assigned a combined recognition probability on the basis of the letter and 
speech recognition. In this case, the word list is drawn up on the basis of the combined 
recognition probability. As a consequence of this, the two recognition probabilities of the above 
described exemplary embodiment, on which the termination criteria according to steps S10 and 
S13 are based, are identical, for which reason one termination criterion can be deleted. 
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[0042] According to a simplified method, it is also possible at each loop traversal only to 
remove words from a word list once drawn up. Since no new words can be added thereby, the 
speech recognition of step S12 need be executed only the once during the first loop traversal, 
since town names set forth in the list have all been evaluated already with the corresponding 
speech recognition probability. 

[0043] In the above exemplary embodiment, the letters are uttered in isolation when being 
spelt out, and are respectively recognized individually. However, it is also possible for the letters 
to be uttered continuously when being spelt out. In the case of such a continuous method, step 
S15 (signal for the next letter and pause) can be eliminated. 

[0044] Figure 4 shows a device, specifically a telephone communication system 1 for an 
automatic directory inquiry service. The telephone communication system 1 is designed as a 
digitally operating telephone communication system having an internal databus 2, a central 
processor unit 3, a memory unit 4, a speech recognition unit 5 and a speech output unit 6. 
Connected to the databus 2 is a switching unit 7 via which the analog and digital telephone lines 
8, 9 can be switched to connect to the units 3 to 6 via the databus 2. Telephone terminals 10 
are connected to the telephone lines 8, 9, it being possible for one or more exchanges 11 to be 
connected there between. 

[0045] A user can select the telephone communication system 1 at one of the telephone 
terminals 10, by which means he is connected to the units 3 to 6 via the switching unit 7 and the 
databus 2. 

[0046] A plurality of computer programs are stored in the memory unit 4. One of these is 
provided for executing the above described exemplary embodiment for recognizing town 
names. If, in his dialog with the telephone communication system, the user reaches a point at 
which the town name needs to be input, an appropriate prompt is output to the user by [means 
of] the speech output unit 6, and the user then speaks a town name. The latter is then detected 
with the aid of the speech recognition unit 5 and evaluated in accordance with the above 
described method, the user being prompted, if required, to spell out the town name. 

[0047] Since the method according to the invention is not limited to the recognition of town 
names, a plurality of programs that operate in accordance with the method according to the 
invention can be provided, these programs each being capable of recognizing words of specific 
vocabularies, for example personal names and company names, numbers, stocks or the like. 



These computer programs designed according to the invention are called up and controlled by a 
higher level dialog control program. 

[0048] These computer programs can also be stored on an electronically readable data 
medium and, for example, can be transmitted to another telephone communication system. 

[0049] The invention has been described in detail with particular reference to preferred 
embodiments thereof and examples, but it will be understood that variations and modifications 
can be effected within the spirit and scope of the invention. 
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Method for detecting and evaluating word speech signals 



representing a word from a user of a speech recognition system 



The invention relates to a method for detecting and evaluating 
word speech signals representing a word from a user of a speech 
recognition system, having the following steps: 
detecting the acoustic word speech signal, 

carrying out a speech recognition operation and assessing 
the probability of correct speech recognition, and 
if the speech recognition does not reach the desired 
probability, the user is prompted to spell out the word, and 
detecting and evaluating the letter speech signals spelt out 
by the user. 



Such a method is known from "Strategies for name recognition in 
automatic directory assistance systems", Andreas Kellner et al . , 
in x IEEE Workshop on interactive Voice Technology for 
Telecommunications Applications (IVTTA) , pages 21-26, Turin, 
Italy, September 1998. A speech recognition system for a telephone 
network that has a word mode and a spelling mode is described 
therein. In the word mode, a word is input coherently by being 
spoken. In the spelling mode, a word is input by spelling out. If 
a word is not recognized in the word mode with a satisfactory 
recognition probability, a switchover is made into the spelling 
mode, in which the word is input by spelling out. Owing to the 
change into the spelling mode, the word mode can be based on a 
program that is relatively simple and quick to execute, and it is 
possible nevertheless to achieve a very high recognition rate, 
since in the case of all words not recognized exactly, the exact 
word 
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is input in the spelling mode. However, the price for this high 
recognition rate is the user-unfriendly spelling out, which lasts 
substantially longer than when the corresponding word is uttered 
coherently . 

5 

US 5,638,425 discloses a method for detecting and evaluating word 
speech signals representing a word from a user of an automatic 
call information service. The method has a word mode and a 
spelling mode. If, in the word mode, the speech recognition does 
10 not recognize an acoustic speech signal with a desired 
jp probability, the method switches over into the spelling mode and 

0 ; prompts the user to spell out the word, for which purpose the user 

pi 

.a spells out the entire word or parts thereof. 

& 



15 Kaspar B. et al . , "Spracherkennung fur groEes Vokabular durch 
Buchstabieren" [ u Speech recognition for a large vocabulary by 
Q ' means of spelling"] , ITG Fachberichte, April 28 th , 1986, pages 31- 

zi 36 discloses an automatic call information service having a 

spelling mode that carries out a word recognition operation after 
2 0 the respective detection of letter speech signals representing a 
letter, in which case the spelling process is terminated and the 
word is output if a word with the desired recognition probability 
is obtained by the word recognition. 



25 It is the object of the invention to develop said methods in a 
user-friendly fashion. 

The object is achieved by means of methods, a device and a 
computer program product having the features of the independent 
3 0 claims. Advantageous refinements of the invention are specified in 
the subclaims. 



The methods include the following steps: 
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carrying out a word recognition operation after the 
respective detection of the letter speech signals 
representing a single letter, and 
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assessing the probability of correct word recognition, and 
if a word is obtained with the desired probability with the 
aid of the word recognition, the spelling process is 
terminated and the word is output. 

5 

In the case of the methods, an attempt is therefore made to 
determine the word to be detected as early as after the spelling 
out of each letter and, if a word is obtained with the desired 
detection probability, the further spelling process is terminated 
y S! 10 and the word is output. As a result, the spelling process, which 

is bothersome for a user, is reduced to a minimum such that the 
user-friendliness of the method is substantially enhanced by 
comparison with the known method, and yet an optimal recognition 
yy? rate is achieved. 



o 
m 



15 



Moreover, a speech recognition operation with the aid of which the 

0 word speech signals representing a word from a user, which signals 

Id- 

5, are uttered coherently, is executed on the basis of a smaller 

M: vocabulary than in the case of the word recognition operation with 

O 2 0 the aid of which the letter speech signals representing the 

fit- 
individual letters are evaluated. As a result, the computational 

outlay on the speech recognition operation can be substantially 

reduced by comparison with a speech recognition operation that 

takes account of all possible words that may occur. A quick 

25 response of the method according to the invention is achieved 

thereby. 



Alternatively or in addition, a renewed speech recognition of the 
word speech signals is executed in the case of the word 

3 0 recognition operation with the aid of which the letter speech 
signals are evaluated, the results obtained by the evaluation of 
the letter speech signals being taken into account in this case. 
This is performed, for example, by virtue of the fact that the 
letter speech signals are used to draw up a word list that is used 

3 5 as vocabulary during the renewed speech recognition. 
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The method according to the invention is ended upon termination of 
the spelling process, and the user of the speech recognition 
system is output a message to the effect that the spelling process 
is ended, or the word detected by the word recognition is imparted 
5 to him. However, it is also possible for only a predetermined 
dialog to be continued between the user and the speech recognition 
system. 

The invention is explained below in more detail with the aid of an 




10 exemplary embodiment illustrated in the drawing. In the drawings: 
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Fig 



Fig 



2 



1 



invention, in a flowchart, 



shows the essential steps of the method according to the 



shows the detection and evaluation of the letter speech 




signals spelt out by the user, in a flowchart, 
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Fig. 3 shows a diagram indicating how many letters must be spelt 
on average in order to achieve a predetermined acceptance 
rate, and 

Fig. 4 shows a device for executing the method according to the 
invention. 

The method according to the invention is explained below in more 
detail with the aid of an exemplary embodiment that is a 
constituent of an automatic directory inquiry service and which 
has an automatic speech recognition system for recognizing all 
German town names. 

22,077 town names are listed in all German telephone books. These 
22,077 words therefore constitute the total vocabulary that 
contains all town names to be determined with the aid of the 
speech recognition system. 

Acoustic word speech signals from a user are detected in a step SI 
(figure 1) . The word speech signals are acoustic signals that 
reproduce a word in a normal, coherent mode of utterance. The 
words are town names in the present exemplary embodiment. 

The word speech signals detected are evaluated by means of a 
speech recognition operation. Such speech recognition operations 
are known per se . They are used to generate a recognition result 
that comprises a word or a list of words, the probability of a 
correct speech recognition being determined in relation to each 
word and assigned to the respective word. 

A check is made in step S3 as to whether the speech recognition 
operation was able to determine a word with the desired speech 
recognition probability. If this is the case, the word determined in 
step S4, which is a town name 
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in the present exemplary embodiment, is output and the method 
according to the invention is ended. 

If, by contrast, the result of step S3 is that no word could be 
determined with the required speech recognition probability, the 
method sequence goes over to step S5 with the aid of which a 
spelling process is executed in which the word to be determined is 
input through spelling out by the user and then evaluated 
appropriately. The spelling process is explained in more detail 
below. 

The word determined in step S5 is output in step S6, and the 
method is ended. 

In the exemplary embodiment relating to detecting and evaluating a 
town name, the speech recognition is not executed on the basis of 
the total vocabulary of 22, 077 town names, but only on the basis 
of a medium vocabulary of approximately 1,000 to 5,000 town names. 
This vocabulary that is considerably reduced by comparison with 
the total vocabulary includes the town names most frequently asked 
for. The computer power required to run a computer program for 
speech recognition can be substantially diminished by the 
reduction in the vocabulary. The desired town name can be 
determined very quickly and with a high recognition rate for a 
majority of the queries because of the reduced vocabulary. 

Since the speech recognition S2 is connected downstream of the 
spelling process S5, the requirement placed on the probability of 
a correct speech recognition can be set very high, since a town 
name incorrectly recognized by the speech recognition can be input 
again into the spelling process, and a rejection in step S3 does 
not result in negative effects on the overall result of the method 
according to the invention. 

The requirements placed on the probability of a correct speech 
recognition are expediently set so high that the rate of the words 
that are erroneously recognized in step S2 and are, however, 
evaluated as correctly recognized in step S3 is smaller than 3% 
and preferably smaller than 1%. 
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The spelling process in accordance with step S5 is illustrated 
with its individual steps in a flowchart in figure 2. 

The user is prompted to fill out the town name in step S7. The 
user is prompted to spell out the letters individually in the 
present exemplary embodiment . 

The letter speech signals, which represent an individual letter, 
are detected and recognized in step S8. 

A word list is drawn up in step S9 in accordance with the letter 
signals detected and evaluated in step S8 . This word list is drawn 
up on the basis of the total vocabulary of all 22,077 town names, 
the individual town names being assigned letter recognition 
probabilities. The letter recognition probability is the 
probability with which one or more spelt out letters of the word 
are correctly recognized with the aid of the detected and 
evaluated letter signals. 

If, for example, the first letter spelt out by the user is a W B" , 
all the town names that begin with a tt B" are assigned high letter 
recognition probabilities. Furthermore, all the town names that 
start with a tt W" are also assigned relatively high letter 
recognition probabilities, since a tt B" and a W W" sound very similar in 
German, and the "W" can therefore be recognized with a relatively high 
probability as a correct letter for the detected and evaluated letter signals of 
the spelt out tt B" . Town names that begin with another letter are therefore 
assigned substantially lower letter recognition probabilities. However, 
only the town names whose letter recognition probability is not lower 
than the highest determined letter recognition probability minus a 
predetermined threshold value SW1 are accepted in the word list. The 
remaining town names are not taken into account in the following steps. 
The method thereby applied for determining the word list is based on the 
Viterbi algorithm. 

The list therefore includes all town names that begin with a *B" 
or a n W" in the case of the input of a W B" as first letter. 

A check is made in the next step S10 as to whether the list 
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includes only a single town name. If this is the case, the method 
sequence is transferred in step Sll to the main method in 
accordance with figure 1, where the town name determined is then 
output in step S6, and the spelling process is terminated. 

If, however, the word list includes a plurality of town names, the 
method sequence branches to step S12, in which case a speech 
recognition of the originally input word speech signals is 
executed anew, the speech recognition being based on the word list 
drawn up in step S9 as vocabulary. Since the speech recognition is 
based on the same word speech signals as in step S2, the same 
speech recognition probabilities are determined for the same 
words. This step differs from the speech recognition according to 
step S2 by virtue of the new vocabulary that has been determined 
from the spelt out letters. The town names newly added by contrast 
with the vocabulary originally used in step S2 are firstly 
assigned a speech recognition probability. The speech recognition 
probability determined by the speech recognition according to step 
S12 is not combined with the letter 

recognition probability determined according to the letter 
recognition of step S8 . 

However, it is also possible to combine these two recognition 
probabilities as an alternative. They can be combined with one 
another by means of multiplication, for example. 

A check is made in the next step S13 as to whether the highest 
speech recognition probability for a town name is higher than the 
second highest speech recognition probability of a further town 
name in the word list by a predetermined threshold value SW2 . If 
this is the case, the method sequence branches to step S14 at 
which the method sequence is transferred again to the main method 
in which the town name with the highest speech recognition 
probability is output and the spelling process is terminated. If 
the highest speech recognition probability is not higher than the 
next highest speech recognition probability by the predetermined 
threshold value SW2 , the method sequence goes over to step S15 in 
which the user is output a signal for speaking the next letter, 
and a pause is made for uttering the next letter. The signal is a 
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short sound signal, for example. 

A check is made in step S16 as to whether the user is speaking a 
further letter. If the user speaks a further letter, the method 
sequence goes over again to step S6 at which the further letter is 
detected and recognized. A loop transversal with the steps S8, S9, 
S10, Sll or S12, S13, S14 or S15 and S16 is thereby begun. 

As in the first loop transversal, a new word list is drawn up in 
the case of each further loop transversal in step S9 . For this 
purpose, the individual town names are again assigned a letter 
recognition probability. This 
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letter recognition probability is determined on the basis of the 
recognition probabilities with which the individual letters of the 
town names have been correctly recognized by the detected and 
evaluated letter signals. The letter recognition probability is 
calculated by multiplying all the recognition probabilities of the 
sequence of spelt out letters of the town names for which a 
corresponding sequence of letter signals has been detected and 
evaluated. This calculation is executed in such a way that the 
letter recognition probability previously determined in step S8 is 
combined with, that is to say multiplied by, the letter 
recognition probability for the execution of step S8 in the 
preceding loop transversal . 

Again, included in the word list are only the town names whose 
letter recognition probability is not lower than the highest 
determined letter recognition probability minus a predetermined 
threshold value SW1 . The remaining town names are not taken into 
account in the following steps, that is to say a new list is drawn 
up, individual town names not being taken into account by 
comparison with the previous list, and others being included anew. 
However, a tendency arises in this case in accordance with which 
the number of the words in the word list is reduced with each loop 
transversal, since the recognition is the more specific the more 
letter signals are detected and evaluated. 

In the case of a plurality of transversals of the loop, the word 
list is reduced solely as determined by the letters input during 
spelling out, and the interrogation of step S10 is therefore based 
solely on the letter recognition probability. 

This loop is transversed until, as a result of one of the two 
interrogations in steps S10 and S13, a town 
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name has been determined with the required recognition 
probability. If steps S10 and S13 do not lead to termination of 
the loop, but a termination of the loop ensues in step S16, that 
is to say because it is established that the user is no longer 
speaking, the method sequence goes over to step S17 with which 
either the word that has the highest recognition probability, or a 
residual list having, for example, the three to ten words on the 
word list that have been assigned the highest recognition 
probabilities, is output. 

Illustrated in figure 3 is a diagram showing how many letters have 
to be spelt out on average, that is to say how many loop 
transversals must be executed until a predetermined acceptance 
rate is reached. Illustrated by dashes in the diagram is the 
result for the method according to the invention, which has two 
termination criteria at steps S10 and S13 . The result for a 
conventional method without such termination criteria is drawn 
with a continuous line. It may be seen from this diagram that, for 
example, after 7 letters have been spelt out an acceptance rate of 
only just above 40% is achieved with the known spelling methods, 
whereas an acceptance rate of over 8 0% is already achieved with 
the method according to the invention. Substantially fewer letters 
need be spelt out with the method according to the invention than 
is the case for conventional spelling methods. This diagram also 
shows that acceptance rates of 80% to 100% are already achieved 
with the spelling out of six letters. 

Consequently, the method according to the invention can be used to 
limit the average number of letters to be spelt out to five to 
seven . 

A successful recognition rate was already achieved for an average 
number of 4 . 9 letters 
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with the aid of the described exemplary embodiment for detecting 
and evaluating the town names of Germany. 

The present invention is not limited to the automatic detection 
and recognition of town names, but is suitable, in particular, for 
all vocabularies with a limited number of words. It can, however, 
also be used for unlimited vocabularies. The method is then to be 
modified in a way known per se such that when words that are not 
yet included in the total vocabulary are being input by spelling 
them out a routine is executed with the aid of which these words 
are added to the total vocabulary. 

The method according to the invention can also be modified in such 
a way that the individual words are assigned a combined 
recognition probability on the basis of the letter and speech 
recognition. In this case, the word list is drawn up on the basis 
of the combined recognition probability. As a consequence of this, 
the two recognition probabilities of the above described exemplary 
embodiment, on which the termination criteria according to steps 
S10 and S13 are based, are identical, for which reason one 
termination criterion can be deleted. 

According to a simplified method, it is also possible at each loop 
traversal only to remove words from a word list once drawn up. 
Since no new words can be added thereby, the speech recognition of 
step S12 need be executed only the once during the first loop 
traversal, since town names set forth in the list have all been 
evaluated already with the corresponding speech recognition 
probability. 

In the above exemplary embodiment, the letters are uttered in 
isolation when being spelt out, and are respectively recognized 
individually. However, it is 
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also possible for the letters to be uttered continuously when 
being spelt out. In the case of such a 
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continuous method, step S15 (signal for the next letter and pause) 
can be eliminated. 

Figure 4 shows a device, specifically a telephone communication 
system 1 for an automatic directory inquiry service. The telephone 
communication system 1 is designed as a digitally operating 
telephone communication system having an internal databus 2, a 
central processor unit 3, a memory unit 4, a speech recognition 
unit 5 and a speech output unit 6. Connected to the databus 2 is a 
switching unit 7 via which the analog and digital telephone lines 
8, 9 can be switched to connect to the units 3 to 6 via the 
databus 2. Telephone terminals 10 are connected to the telephone 
lines 8, 9, it being possible for one or more exchanges 11 to be 
connected there between. 

A user can select the telephone communication system 1 at one of 
the telephone terminals 10, by which means he is connected to the 
units 3 to 6 via the switching unit 7 and the databus 2. 

A plurality of computer programs are stored in the memory unit 4. 
One of these is provided for executing the above described 
exemplary embodiment for recognizing town names. If, in his dialog 
with the telephone communication system, the user reaches a point 
at which the town name needs to be input, an appropriate prompt is 
output to the user by means of the speech output unit 6, and the 
user then speaks a town name. The latter is then detected with the 
aid of the speech recognition unit 5 and evaluated in accordance 
with the above described method, the user being prompted, if 
required, to spell out the town name. 

Since the method according to the invention is not limited to the 
recognition of town names, a plurality 



GR 99 P 2738 

- 12a - 

of programs that operate in accordance with the method according 
to the invention can be provided, these programs each being 
capable of recognizing words of specific 
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vocabularies, for example personal names and company names, 
numbers, stocks or the like. These computer programs designed 
according to the invention are called up and controlled by a 
higher level dialog control program. 

These computer programs can also be stored on an electronically 
readable data medium and, for example, can be transmitted to 
another telephone communication system. 
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Patent claims 

1. A method for detecting and evaluating word speech signals 
representing a word from a user of a speech recognition system, 
having the following steps: 

- detecting the acoustic word speech signals, 

- carrying out a speech recognition operation and assessing 
the probability of correct speech recognition, 

- if the speech recognition does not reach the desired 
probability, the user is prompted to spell out the word, 

- detecting and evaluating the letter signals spelt out by 
the user, 

- carrying out a word recognition operation after the 
respective detection of the letter signals representing a single 
letter, 

- assessing the probability of correct word recognition, 

- if a word is obtained with the desired probability with 
the aid of the word recognition, the spelling process is 
terminated and the word is output, 

- the speech recognition operation being executed on the 
basis of a smaller vocabulary than does the word recognition 
operation . 

2. The method as claimed in claim 1, characterized in that the 
during the word recognition operation, a word list is drawn up in 
accordance with the detected letter speech signals, the words of a 
total vocabulary in each case being assigned a letter recognition 
probability on the basis of the letter speech signals, and the 
word list comprising all words whose letter recognition 
probability is not lower than the highest determined letter 
recognition probability of a word minus a threshold value (SW1) . 
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3. The method as claimed in claim 2, characterized in that a 
check is made as to whether the word list contains only a single 
word, and if only a single word is contained the latter is output 
and the spelling process is terminated. 

5 

4. The method as claimed in claim 2 or 3 , characterized in that 
a renewed speech recognition of the word speech signals is carried 
out in which the words of the word list are respectively assigned 
a speech recognition probability, and a check is made as to 

10 whether the highest and the second highest speech recognition 
probability differ from one another by a predetermined threshold 
value (SW2), and if this is the case the word of the word list 
with the highest speech recognition probability is output, and the 
spelling process is terminated. 

15 

5. A method for detecting and evaluating word speech signals 
representing a word from the user of a speech recognition system, 
in particular as claimed in one of the preceding claims, having 
the following steps: 

2 0 - detecting the acoustic word speech signals, 

- carrying out a speech recognition operation and assessing 
the probability of correct speech recognition, 

- if the speech recognition does not reach the desired 
probability, the user is prompted to spell out the word, 

25 - detecting and evaluating the letter signals spelt out by 

the user, 

- carrying out a word recognition operation after the 
respective detection of the letter signals representing a single 
letter, 

30 
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-assessing the probability of correct word recognition, 

- if a word is obtained with the desired probability with 

the aid of the word recognition, the spelling process is 

terminated and the word is output, 
5 and 

a renewed speech recognition of the word speech signals 
is carried out taking account of the detected letter speech 
signals - 



10 6. A method for detecting and evaluating word speech signals 
representing a word from a user of a speech recognition system, in 
particular as claimed in one of the preceding claims, having the 
following steps: 

- detecting the acoustic word speech signals, 

15 _ carrying out a speech recognition operation and assessing 

the probability of correct speech recognition, 

- if the speech recognition does not reach the desired 
probability, the user is prompted to spell out the word, 

- detecting and evaluating the letter signals spelt out by 
20 the user, 

- carrying out a word recognition operation after the 
respective detection of the letter signals representing a single 
letter, 

- assessing the probability of correct word recognition, 

25 - if a word is obtained with the desired probability with 

the aid of the word recognition, the spelling process is 
terminated and the word is output, 

- during the word recognition operation a letter recognition 
probability being determined on the basis of the detected and 

30 evaluated letter signals and being combined with a speech 
recognition probability determined on the basis of the detected 
and 
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evaluated word speech signals to form a combined recognition 
probability. 
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7. The method as claimed in claim 6, characterized in that a 
word list is drawn up in accordance with the combined recognition 
probability. 

5 8. The method as claimed in claim 6 or 7, characterized in that 
a check is made solely with the aid of a single interrogation as 
to whether a word is obtained with the desired recognition 
probability, the combined recognition probability being used as 
recognition probability. 



10 



9. The method as claimed in one of claims 1 to 8, characterized 
in that the spelling process is terminated by outputting an 
appropriate message to the user and by ending the method for 



Ssssf' 

m. 

5 iJ detecting and evaluating a word. 

15 



M 10. The method as claimed in one of claims 2 to 9 , characterized 

s in that when the spelling process has not yet been terminated, 

jjp. after the detection and evaluation of the letter speech signals 

p. respectively representing a letter, a check is made as to whether 

H 5 ' 20 the user is continuing to speak, and if he is continuing to speak 

the next speech signals respectively representing a letter are 
detected, and if the user is not continuing to speak the word list 
or a predetermined number of the words with the highest 
probability of the word list is output. 



25 



11. A device that is set up, and has means, for carrying out a 
method as claimed in one of claims 1 to 10. 

12. The device as claimed in claim 11, characterized in that 



30 
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the device is a telephone communication system (1) that has a 
switching unit (7) with the aid of which telephone lines (8, 9) 
can be connected to the internal databus (2) . 

5 13. A computer program product for a data processing system that 
contains software code sections with the aid of which a method as 
claimed in at least one of claims 1 to 10 can be executed on a 
data processing system. 
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Abstract 

Method for detecting and evaluating word speech signals 
representing a word from a user of a speech recognition system 

In the event of a possibly incorrect speech recognition, the 
method according to the invention prompts the user to spell out 
the corresponding word. After each spelt -out word, word 
recognition is executed such that the spelling process can be 
terminated given a satisfactory recognition probability. 



Figure 2 



%:l ?i? 



SO/069989 



4/4 



FIG 4 



10 



p 

w 
CP 

3 

b 
w 



10 



-11 



10 

± 



10 



10 



3 4 5 



_1L 



~7T 

-1L 



"7T 



■7T" 



7T 



1999P02738WOUS 

Declaration and Power of Attorney For Patent Application 
Erklarung Fur Patentanmeldungen Mit Vollmacht 

German Language Declaration 



Als nachstehend benannter Erfinder erklare ich hiermit ' As a below named inventor, I hereby declare that: 

an Eides Statt: 



dass mein Wohnsitz, meine Postanschrift, und meine My residence, post office address and citizenship are 

Staatsangehorigkeit den im Nachstehenden nach as stated below next to my name, 

meinem Namen aufgefuhrten Angaben entsprechen, 



dass ich, nach bestem Wissen der ursprungliche, erste 
und alleinige Erfinder (falls nachstehend nur ein Name 
angegeben ist) oder ein ursprunglicher, erster und 
Miterfinder (falls nachstehend mehrere Namen 
aufgefuhrt sind) des Gegenstandes bin, fur den dieser 
Antrag gestellt wird und fur den ein Patent beantragt 
wird fur die Erfindung mit dem Titel: 

Verfahren zum Erfassen und Auswerten 
von ein Wort darstellenden 
Wortsprachsianalen eines Benutzers 
eines Spracherkennungssystems 



I believe I am the original, first and sole inventor (if only 
one name is listed below) or an original, first ar - ' joint 
inventor (if plural names are listed below) of the 
subject matter which is claimed and for which a patent 
is sought on the invention entitled 



Method and device for detecting and 
evaluating vocal signals representing a 
word emitted by a user of a voice- 
recognition system 



deren Beschreibung 

(zutreffendes ankreuzen) 

□ hier beigefugt ist. 

Kl am 31.05.2000 als 

PCT internationale Anmeldung 

PCT Anmeldungsnummer PCT/DE00/01 787 

eingereicht wurde und am 

abgeandert wurde (falls tatsachlich abgeandert). 



the specification of which 

(check one) 

□ is attached hereto. 

I3 was filed on 31.05.2000 as 

PCT international application 

PCT Application No. PCT/DEOQ/01 787 

and was amended on 

(if applicable) 



Ich bestatige hiermit, dass ich den Inhalt der obigen 
Patentanmeldung einschliesslich der Anspruche 
durchgesehen und verstanden habe, die eventuell 
durch einen Zusatzantrag wie oben erwahnt abgean- 
dert wurde. 



I hereby state that I have reviewed and understand the 
contents of the above identified specification, including 
the claims as amended by any amendment referred to 
above. 



Ich erkenne meine Pflicht zur Offenbarung irgendwel- 
cher Informationen, die fur die Prufung der vorliegen- 
den Anmeldung in Einklang mit Absatz 37, Bundes- 
gesetzbuch, Paragraph 1.56(a) von Wichtigkeit sind, 
an. 



I acknowledge the duty to disclose information which is 
material to the examination of this application in 
accordance with Title 37, Code of Federal Regulations, 
§1. 56(a). 



Ich beanspruche hiermit auslandische Prioritatsvorteile 
gemass Abschnitt 35 der Zivilprozessordnung der 
Vereinigten Staaten, Paragraph 119 ailer unten ange- 
gebenen Auslandsanmeldungen fur ein Patent oder 
eine Erfindersurkunde, und habe auch alle Auslands- 
anmeldungen fur ein Patent oder eine Erfindersurkun- 
de nachstehend gekennzeichnet, die ein Anmelde- 
datum haben, das vor dem Anmeidedatum der 
Anmeldung liegt, fur die Prioritat beansprucht wird. 



I hereby claim foreign priority benefits under Title 35, 
United States Code, §119 of any foreign application(s) 
for patent or inventor's certificate listed below anH !?ave 
also identified below any foreign application for patent 
or inventor's certificate having a filing date before that 
of the application on which priority is claimed: 
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German Language Declaration 



Prior foreign appplications 
Prioritat beansprucht 



Priority Claimed 



19942172.2 



(Number) 
(Nummer) 



(Number) 
(Nummer) 



(Number) 
(Nummer) 



DE 

(Country) 
(Land) 



(Country) 
(Land) 



(Country) 
(Land) 



03.09.1999 

(Day Month Year Filed) 

(Tag Monat Jahr eingereicht) 



(Day Month Year Filed) 
(Tag Monat Jahr eingereicht) 



(Day Month Year Filed) 
(Tag Monat Jahr eingereicht) 



Yes 
Ja 



□ 

Yes 

Ja 



□ 
No 
Nein 



□ 

No 
Nein 



□ □ 

Yes No 
Ja Nein 



Ich beanspruche hiermit gemass Absatz 35 der Zivil- 
prozessordnung der Vereinigten Staaten, Paragraph 
120, den Vorzug aller unten aufgefuhrten Anmel- 
dungen und falls der Gegenstand aus jedem Anspruch 
dieser Anmeldung nicht in einer fruheren 
amerikanischen Patentanmeldung laut dem ersten 
Paragraphen des Absatzes 35 der Ziviiprozeftordnung 
der Vereinigten Staaten, Paragraph 122 offenbart ist, 
erkenne ich gemass Absatz 37, Bundesgesetzbuch, 
Paragraph 1.56(a) meine Pflicht zur Offenbarung von 
Informationen an, die zwischen dem Anmeldedatum 
der fruheren Anmeldung und dem nationalen oder PCT 
internationalen Anmeldedatum dieser Anmeldung 
bekannt geworden sind. 



I hereby claim the benefit under Title 35. United States 
Code. §120 of any United States application(s) listed 
below and, insofar as the subject matter of each of the 
claims of this application is not disclosed in the prior 
United States application in the manner provided by 
the first paragraph of Title 35, United States Code, 
§122, I acknowledge the duty to disclose material 
information as defined in Title 37, Code of Federal 
Regulations, §1. 56(a) which occured between the filing 
date of the prior application and the national or PCT 
international filing date of this application. 



PCT/DE00/01787 
(Application Serial No.) 
(Anmeldeseriennummer) 



(Application Serial No.) 
(Anmeldeseriennummer) 



31.05.2000 
(Filing Date D, M, Y) 
(Anmeldedatum T, M, J) 



(Filing Date D.M.Y) 
(Anmeldedatum T, M; J) 



anhangig 

(Status) 

(patentiert, anhangig, 
aufgegeben) 



(Status) 

(patentiert, anhangig, 
aufgeben) 



pending 
(Status) 

(patented, pending 
abandoned) 



(Status) 

(patented, pending, 
abandoned) 



Ich erklare hiermit, dass alle von mir in der vorliegen- 
den Erklarung gemachten Angaben nach meinem 
besten Wissen und Gewissen der vollen Wahrheit 
entsprechen, und dass ich diese eidesstattliche Erkla- 
rung in Kenntnis dessen abgebe, dass wissentlich und 
vorsatzlich falsche Angaben gemass Paragraph 1001, 
Absatz 18 der Zivilprozessordnung der Vereinigten 
Staaten von Amerika mit Geldstrafe belegt und/oder 
Gefangnis bestraft werden koennen, und dass derartig 
wissentlich und vorsatzlich falsche Angaben die Gul- 
tigkeit der vorliegenden Patentanmeldung oder eines 
darauf erteilten Patentes gefahrden konnen. 



I hereby declare that all statements made herein of my 
own knowledge are true and that all statements made 
on information and belief are believed to be true, and 
further that these statements were made with the 
knowledge that willful false statements and the like so 
made are punishable by fine or imprisonment, or both, 
under Section 1001 of Title 18 of the United States 
Code and that such willful false statements may 
jeopardize the validity of the application or any patent 
issued thereon. 
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German Language Declaration 



VERTRETUNGSVOLLMACHT: Als benannter Erfinder 
beauftrage ich hiermit den nachstehend benannten 
Patentanwalt (oder die nachstehend benannten 
Patentanwalte) und/oder Patent-Agenten mit der 
Verfoigung der vorliegenden Patentanmeldung sowie 
mit der Abwickiung ailer damit verbundenen Geschafte 
vor dem Patent- und Warenzeichenamt: (Name und 
Registrationsnummer anfuhren) 



POWER OF ATTORNEY: As a named inventor, I 
hereby appoint the following attorney(s) and/or 
agent(s) to prosecute this application and transact all 
business in the Patent and Trademark Office 
connected therewith, (fist name and registration 
number) 



Customer No. 21171 



And I hereby appoint 



Telefongesprache bitte richten an: 
(Name und Telefonnummer) 



Direct Telephone Calls to: (name and telephone 
number) 



Ext. 



Postanschrift: Send Correspondence to: 

Staas & Halsey LLP 
700 Eleventh Street NW, Suite 500 20001 Washington, DC 
Telephone: (001) 202 434 1500 and Facsimile (001) 202 434 1501 

f Customer No. 21 1 71 



Voller Name des einzigen oder urspriinglichen Erfinders: 

JOSEF BAUER 



Unterschrift des Erfinders 



Datum 

21. a. mi 



Wohnsitz 



MUENCHEN, DEUTSCHLAND 



Staatsangehorigkeit 

DE 



Postanschrift 

WINZERER STR. 96 



80797 MUENCHEN 



Voller Name des zweiten Miterfrnders (falls zutreffend): 

Dr. JOCHEN JUNKAWITSCH 



Unjgr^hrifUas Erfinders f ^ 



Vffohnsitz 

Eschenbach, DEUTSCHLAND 



Datum 



Staatsang ehorig keit 

DE 



Postanschrift 

Sonnenstr. 24 



92676 Eschenbach 



Full name of sole or first inventor: 

.JOSEF BAUER 



Inventor's signature 



Date 



Residence 



MUENCHEN GERMANY L&X 



Citizenship 

DE 



Post Office Addess 

WINZERER STR. 96 



80797 MUENCHEN 



Full name of second joint inventor, if any: 

Dr. JOCHEN JUNKAWlISCiL 



Second Inventor's signature 



Date 



Residence 

Eschenbach- GERMANY 



Citizenship 

DE 



Post Office Address 

Sonnenstr. 24 



92676 Eschenbach 



(Bitte entsprechende Informationen und Unterschriften im 
Falle von dritten und weiteren Miterfindern angeben). 



(Supply similar information and signature for third and 
subsequent joint inventors). 



/)(f) 



Form PTO-FB-240 (8-83) 



Page 3 

Patent and Trademark Office-U.S. Department of COMMERCE 



Voller Name des dritten Miterfinders: 

Dr. TOBIAS SCHNEIDER 


Full name of third joint inventor: 

Dr. TOBIAS SCHNEIDER ft 


Unterschrift des Erfinders Datum 


Inventor's signature Date 


Wohnsitz 

MUENCHEN, DEUTSCHLAND 


Residence 

P IFNCHFN, GFRMANY 


Staatsangehorigkeit 

DE 


Citizenship 

DE i 


Postanschrift 

KRANZHORNSTR. 7 


Post Office Address 

KRANZHORNSTR. 7 


81825 MUENCHEN 


81825 MUENCHEN 


Voller Name des vierten Miterfinders: 


Full name of fourth joint inventor: 


Unterschrift des Erfinders Datum 


Inventor's signature Date 


Wohnsitz 
i 


Residence 
j 


Staatsangehorigkeit 


Citizenship 


Postanschrift 


Post Office Address 






Voller Name des funften Miterfinders: 


Full name of fifth joint inventor: 


Unterschrift des Erfinders Datum 


Inventor's signature Date 


Wohnsitz 
j 


Residence 
i 


Staatsa ng end rig keit 


Citizenship 


Postanschrift 


Post Office Address 






Voller Name des sechsten Miterfinders: 


Full name of sixth joint inventor: 


Unterschrift des Erfinders Datum 


Inventor's signature Date 


Wohnsitz 


Residence 
i 


Staatsangehorigkeit 


Citizenship 


Postanschrift 


Post Office Address 







(Bitte entsprechende Informationen und Unterschriften im (Supply similar information and signature for third and 
Falle von dritten und weiteren Miterfindern angeben). subsequent joint inventors). 
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